Streaming responses are implemented by setting the stream parameter to true in the API request, which returns a stream of Server-Sent Events (SSE) that the client processes to display text incrementally.
Streaming allows you to display the model's response in real-time, character by character or token by token, rather than waiting for the entire output to be generated. This dramatically improves the perceived speed and responsiveness of your application, especially for longer responses. Both the OpenAI and Anthropic APIs support this by setting stream: true in the request body .
Instead of waiting for a complete JSON response, the API returns a stream of Server-Sent Events (SSE). The client iterates over this stream, receiving data chunks as they are generated. For each chunk, you can parse the content, extract the new text fragment (often within a delta object), and append it to the UI. This process continues until the stream ends . This technique is essential for creating a fluid, chat-like user experience.